Food Hygine Statistical Analysis

This report full fills the request of politicians and managers of the Food Standards Agency performing the specific analysis

The data provided contains one row for each local authority in England, Wales and Northern Ireland and establishments within each local authority. Establishments within each local authority are rated for their potential impact on public health.

Data Dictionary

Variable Name Description
Country The region where the local council is situated
LAType A local authority’s category.
LAName The regional government’s name.
Totalestablishments(includingnotyetrated&outside) The total number of establishments, including those outside the programme and those whose intervention potential has not yet been determined.
Total%ofInterventionsachieved(premisesratedA-E) The overall success rate for interventions for buildings with grades A through E.
Total%ofInterventionsachieved-premisesratedA The aggregate success rate of interventions for locations with an A rating.
Total%ofInterventionsachieved-premisesratedB The whole percentage of interventions completed for B-rated locations.
Total%ofInterventionsachieved-premisesratedC The overall success rate of interventions for C-rated buildings.
Total%ofInterventionsachieved-premisesratedD The overall success rate of interventions for D-rated establishments.
Total%ofInterventionsachieved-premisesratedE The overall success rate of treatments for E-rated properties.
Aratedestablishments The quantity of rated establishments A.
Bratedestablishments The number of organizations with a B rating.
Cratedestablishments The number of firms with a C rating.
Dratedestablishments The volume of businesses with a D rating.
Eratedestablishments The total number of businesses with an E rating.
ProfessionalFullTimeEquivalentPosts-occupied * The quantity of professional full-time equivalent positions that are currently filled.

Section 1

Read Data

#Import Data Set and checking Structure and Summary
df <- read_csv("2019-20-enforcement-data-food-hygiene.csv")
## Rows: 353 Columns: 36
## ── Column specification ────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): Country, LAType, LAName, Total%ofBroadlyCompliantestablishments-A, Total%ofInterventio...
## dbl (30): Totalestablishments(includingnotyetrated&outside), Establishmentsnotyetratedforinterve...
## num  (1): TotalnumberofestablishmentssubjecttoWrittenwarnings
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
print(head(df))
## # A tibble: 6 × 36
##   Country LAType      LAName Total…¹ Estab…² Estab…³ Total…⁴ Total…⁵ Arate…⁶ Total…⁷ Brate…⁸ Total…⁹
##   <chr>   <chr>       <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>     <dbl>   <dbl>
## 1 England District C… Adur …    1478      24       0    97.2    95.6       3 33.33        39    69.2
## 2 England District C… Aller…    1316      29      74    97.2    94.9       2 50           26    76.9
## 3 England District C… Amber…    1112       1       0    97.5    97.4       2 50           39    64.1
## 4 England District C… Arun      1208      44       1    97.7    94.1       3 0            28    82.1
## 5 England District C… Ashfi…     905      26       1    96.7    93.9       1 0            31    77.4
## 6 England District C… Ashfo…    1132       0       0    98.6    98.6       5 20           15    66.7
## # … with 24 more variables: Cratedestablishments <dbl>,
## #   `Total%ofBroadlyCompliantestablishments-C` <dbl>, Dratedestablishments <dbl>,
## #   `Total%ofBroadlyCompliantestablishments-D` <dbl>, Eratedestablishments <dbl>,
## #   `Total%ofBroadlyCompliantestablishments-E` <dbl>,
## #   `Total%ofInterventionsachieved(premisesratedA-E)` <dbl>,
## #   `Total%ofInterventionsachieved-premisesratedA` <chr>,
## #   `Total%ofInterventionsachieved-premisesratedB` <dbl>, …
print(str(df))
## spc_tbl_ [353 × 36] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Country                                                                                              : chr [1:353] "England" "England" "England" "England" ...
##  $ LAType                                                                                               : chr [1:353] "District Council" "District Council" "District Council" "District Council" ...
##  $ LAName                                                                                               : chr [1:353] "Adur and Worthing" "Allerdale" "Amber Valley" "Arun" ...
##  $ Totalestablishments(includingnotyetrated&outside)                                                    : num [1:353] 1478 1316 1112 1208 905 ...
##  $ Establishmentsnotyetratedforintervention                                                             : num [1:353] 24 29 1 44 26 0 58 40 41 84 ...
##  $ Establishmentsoutsidetheprogramme                                                                    : num [1:353] 0 74 0 1 1 0 214 39 0 42 ...
##  $ Total%ofBroadlyCompliantestablishmentsratedA-E                                                       : num [1:353] 97.2 97.2 97.5 97.7 96.7 ...
##  $ Total%ofBroadlyCompliantestablishments(includingnotyetrated)                                         : num [1:353] 95.6 94.9 97.4 94.1 93.9 ...
##  $ Aratedestablishments                                                                                 : num [1:353] 3 2 2 3 1 5 1 4 1 4 ...
##  $ Total%ofBroadlyCompliantestablishments-A                                                             : chr [1:353] "33.33" "50" "50" "0" ...
##  $ Bratedestablishments                                                                                 : num [1:353] 39 26 39 28 31 15 20 44 31 36 ...
##  $ Total%ofBroadlyCompliantestablishments-B                                                             : num [1:353] 69.2 76.9 64.1 82.1 77.4 ...
##  $ Cratedestablishments                                                                                 : num [1:353] 227 243 179 211 145 125 270 219 96 190 ...
##  $ Total%ofBroadlyCompliantestablishments-C                                                             : num [1:353] 91.2 90.1 93.8 94.3 89.7 ...
##  $ Dratedestablishments                                                                                 : num [1:353] 592 469 432 483 353 453 555 626 186 519 ...
##  $ Total%ofBroadlyCompliantestablishments-D                                                             : num [1:353] 99 99.4 99.5 98.5 98.3 ...
##  $ Eratedestablishments                                                                                 : num [1:353] 593 473 459 438 348 534 628 1030 219 525 ...
##  $ Total%ofBroadlyCompliantestablishments-E                                                             : num [1:353] 99.8 100 100 100 100 ...
##  $ Total%ofInterventionsachieved(premisesratedA-E)                                                      : num [1:353] 96.1 90.6 88.9 94 80.7 ...
##  $ Total%ofInterventionsachieved-premisesratedA                                                         : chr [1:353] "100" "100" "100" "100" ...
##  $ Total%ofInterventionsachieved-premisesratedB                                                         : num [1:353] 100 98.3 95.1 96.3 100 ...
##  $ Total%ofInterventionsachieved-premisesratedC                                                         : num [1:353] 95.5 89.7 97 94.4 78.8 ...
##  $ Total%ofInterventionsachieved-premisesratedD                                                         : num [1:353] 96 93 91.8 92.6 85.3 ...
##  $ Total%ofInterventionsachieved-premisesratedE                                                         : num [1:353] 94 85.1 72.3 95.5 68.3 ...
##  $ Total%ofInterventionsachieved-premisesnotyetrated                                                    : num [1:353] 100 100 100 95.4 79.6 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Voluntaryclosure                        : num [1:353] 5 0 0 2 1 0 0 0 0 0 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Seizure,detention&surrenderoffood       : num [1:353] 4 0 0 0 0 0 0 0 0 0 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Suspension/revocationofapprovalorlicence: num [1:353] 0 0 0 0 0 0 1 0 0 0 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneemergencyprohibitionnotice       : num [1:353] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Prohibitionorder                        : num [1:353] 0 0 0 0 0 0 1 0 0 0 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Simplecaution                           : num [1:353] 0 0 1 0 0 0 0 0 0 0 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneimprovementnotices               : num [1:353] 3 6 11 3 4 0 3 2 1 2 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Remedialaction&detentionnotices         : num [1:353] 0 0 0 0 0 0 0 0 0 0 ...
##  $ TotalnumberofestablishmentssubjecttoWrittenwarnings                                                  : num [1:353] 323 413 515 386 252 224 223 152 179 175 ...
##  $ Totalnumberofestablishmentssubjecttoformalenforcementactions-Prosecutionsconcluded                   : num [1:353] 0 0 1 0 0 0 0 2 0 0 ...
##  $ ProfessionalFullTimeEquivalentPosts-occupied *                                                       : num [1:353] 5 4 3.5 4 2 4.65 2.5 5 2 4.2 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Country = col_character(),
##   ..   LAType = col_character(),
##   ..   LAName = col_character(),
##   ..   `Totalestablishments(includingnotyetrated&outside)` = col_double(),
##   ..   Establishmentsnotyetratedforintervention = col_double(),
##   ..   Establishmentsoutsidetheprogramme = col_double(),
##   ..   `Total%ofBroadlyCompliantestablishmentsratedA-E` = col_double(),
##   ..   `Total%ofBroadlyCompliantestablishments(includingnotyetrated)` = col_double(),
##   ..   Aratedestablishments = col_double(),
##   ..   `Total%ofBroadlyCompliantestablishments-A` = col_character(),
##   ..   Bratedestablishments = col_double(),
##   ..   `Total%ofBroadlyCompliantestablishments-B` = col_double(),
##   ..   Cratedestablishments = col_double(),
##   ..   `Total%ofBroadlyCompliantestablishments-C` = col_double(),
##   ..   Dratedestablishments = col_double(),
##   ..   `Total%ofBroadlyCompliantestablishments-D` = col_double(),
##   ..   Eratedestablishments = col_double(),
##   ..   `Total%ofBroadlyCompliantestablishments-E` = col_double(),
##   ..   `Total%ofInterventionsachieved(premisesratedA-E)` = col_double(),
##   ..   `Total%ofInterventionsachieved-premisesratedA` = col_character(),
##   ..   `Total%ofInterventionsachieved-premisesratedB` = col_double(),
##   ..   `Total%ofInterventionsachieved-premisesratedC` = col_double(),
##   ..   `Total%ofInterventionsachieved-premisesratedD` = col_double(),
##   ..   `Total%ofInterventionsachieved-premisesratedE` = col_double(),
##   ..   `Total%ofInterventionsachieved-premisesnotyetrated` = col_double(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Voluntaryclosure` = col_double(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Seizure,detention&surrenderoffood` = col_double(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Suspension/revocationofapprovalorlicence` = col_double(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneemergencyprohibitionnotice` = col_double(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Prohibitionorder` = col_double(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Simplecaution` = col_double(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneimprovementnotices` = col_double(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Remedialaction&detentionnotices` = col_double(),
##   ..   TotalnumberofestablishmentssubjecttoWrittenwarnings = col_number(),
##   ..   `Totalnumberofestablishmentssubjecttoformalenforcementactions-Prosecutionsconcluded` = col_double(),
##   ..   `ProfessionalFullTimeEquivalentPosts-occupied *` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr> 
## NULL
print(summary(df))
##    Country             LAType             LAName         
##  Length:353         Length:353         Length:353        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  Totalestablishments(includingnotyetrated&outside) Establishmentsnotyetratedforintervention
##  Min.   : 145.0                                    Min.   :   0.00                         
##  1st Qu.: 920.5                                    1st Qu.:  25.00                         
##  Median :1330.0                                    Median :  49.00                         
##  Mean   :1620.7                                    Mean   :  89.75                         
##  3rd Qu.:2004.5                                    3rd Qu.: 100.00                         
##  Max.   :9277.0                                    Max.   :1744.00                         
##  NA's   :6                                         NA's   :6                               
##  Establishmentsoutsidetheprogramme Total%ofBroadlyCompliantestablishmentsratedA-E
##  Min.   :  0.00                    Min.   : 74.61                                
##  1st Qu.:  0.00                    1st Qu.: 95.37                                
##  Median :  2.00                    Median : 97.13                                
##  Mean   : 49.62                    Mean   : 96.33                                
##  3rd Qu.: 39.00                    3rd Qu.: 98.19                                
##  Max.   :865.00                    Max.   :100.00                                
##  NA's   :6                         NA's   :6                                     
##  Total%ofBroadlyCompliantestablishments(includingnotyetrated) Aratedestablishments
##  Min.   :69.45                                                Min.   : 0.000      
##  1st Qu.:89.23                                                1st Qu.: 1.000      
##  Median :92.80                                                Median : 2.000      
##  Mean   :91.54                                                Mean   : 4.285      
##  3rd Qu.:95.16                                                3rd Qu.: 5.000      
##  Max.   :99.87                                                Max.   :72.000      
##  NA's   :6                                                    NA's   :6           
##  Total%ofBroadlyCompliantestablishments-A Bratedestablishments
##  Length:353                               Min.   :  2.00      
##  Class :character                         1st Qu.: 23.00      
##  Mode  :character                         Median : 39.00      
##                                           Mean   : 55.61      
##                                           3rd Qu.: 68.50      
##                                           Max.   :516.00      
##                                           NA's   :6           
##  Total%ofBroadlyCompliantestablishments-B Cratedestablishments
##  Min.   :  5.88                           Min.   :  18.0      
##  1st Qu.: 55.62                           1st Qu.: 144.0      
##  Median : 69.23                           Median : 225.0      
##  Mean   : 67.34                           Mean   : 302.6      
##  3rd Qu.: 80.15                           3rd Qu.: 376.0      
##  Max.   :100.00                           Max.   :1647.0      
##  NA's   :6                                NA's   :6           
##  Total%ofBroadlyCompliantestablishments-C Dratedestablishments
##  Min.   : 71.96                           Min.   :  56.0      
##  1st Qu.: 89.36                           1st Qu.: 307.0      
##  Median : 92.86                           Median : 462.0      
##  Mean   : 92.02                           Mean   : 553.6      
##  3rd Qu.: 95.50                           3rd Qu.: 682.0      
##  Max.   :100.00                           Max.   :3053.0      
##  NA's   :6                                NA's   :6           
##  Total%ofBroadlyCompliantestablishments-D Eratedestablishments
##  Min.   : 75.74                           Min.   :  63.0      
##  1st Qu.: 97.94                           1st Qu.: 362.5      
##  Median : 99.03                           Median : 480.0      
##  Mean   : 98.38                           Mean   : 565.3      
##  3rd Qu.: 99.60                           3rd Qu.: 667.5      
##  Max.   :100.00                           Max.   :4309.0      
##  NA's   :6                                NA's   :6           
##  Total%ofBroadlyCompliantestablishments-E Total%ofInterventionsachieved(premisesratedA-E)
##  Min.   : 79.22                           Min.   : 20.64                                 
##  1st Qu.: 99.83                           1st Qu.: 81.81                                 
##  Median :100.00                           Median : 90.82                                 
##  Mean   : 99.82                           Mean   : 86.62                                 
##  3rd Qu.:100.00                           3rd Qu.: 95.39                                 
##  Max.   :100.00                           Max.   :100.00                                 
##  NA's   :6                                NA's   :6                                      
##  Total%ofInterventionsachieved-premisesratedA Total%ofInterventionsachieved-premisesratedB
##  Length:353                                   Min.   : 50.00                              
##  Class :character                             1st Qu.: 93.53                              
##  Mode  :character                             Median : 97.75                              
##                                               Mean   : 95.25                              
##                                               3rd Qu.:100.00                              
##                                               Max.   :100.00                              
##                                               NA's   :6                                   
##  Total%ofInterventionsachieved-premisesratedC Total%ofInterventionsachieved-premisesratedD
##  Min.   : 18.37                               Min.   : 19.77                              
##  1st Qu.: 89.27                               1st Qu.: 82.00                              
##  Median : 94.97                               Median : 91.72                              
##  Mean   : 91.84                               Mean   : 86.32                              
##  3rd Qu.: 97.77                               3rd Qu.: 95.98                              
##  Max.   :100.00                               Max.   :100.00                              
##  NA's   :6                                    NA's   :6                                   
##  Total%ofInterventionsachieved-premisesratedE Total%ofInterventionsachieved-premisesnotyetrated
##  Min.   :  1.81                               Min.   :  6.56                                   
##  1st Qu.: 65.39                               1st Qu.: 85.81                                   
##  Median : 87.50                               Median : 99.75                                   
##  Mean   : 77.37                               Mean   : 90.95                                   
##  3rd Qu.: 95.90                               3rd Qu.:100.00                                   
##  Max.   :100.00                               Max.   :100.00                                   
##  NA's   :6                                    NA's   :6                                        
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Voluntaryclosure
##  Min.   : 0.000                                                               
##  1st Qu.: 0.000                                                               
##  Median : 1.000                                                               
##  Mean   : 2.712                                                               
##  3rd Qu.: 3.000                                                               
##  Max.   :62.000                                                               
##  NA's   :6                                                                    
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Seizure,detention&surrenderoffood
##  Min.   : 0.000                                                                                
##  1st Qu.: 0.000                                                                                
##  Median : 0.000                                                                                
##  Mean   : 1.199                                                                                
##  3rd Qu.: 1.000                                                                                
##  Max.   :52.000                                                                                
##  NA's   :6                                                                                     
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Suspension/revocationofapprovalorlicence
##  Min.   :0.00000                                                                                      
##  1st Qu.:0.00000                                                                                      
##  Median :0.00000                                                                                      
##  Mean   :0.06916                                                                                      
##  3rd Qu.:0.00000                                                                                      
##  Max.   :7.00000                                                                                      
##  NA's   :6                                                                                            
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneemergencyprohibitionnotice
##  Min.   : 0.0000                                                                               
##  1st Qu.: 0.0000                                                                               
##  Median : 0.0000                                                                               
##  Mean   : 0.7118                                                                               
##  3rd Qu.: 0.0000                                                                               
##  Max.   :42.0000                                                                               
##  NA's   :6                                                                                     
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Prohibitionorder
##  Min.   :0.0000                                                               
##  1st Qu.:0.0000                                                               
##  Median :0.0000                                                               
##  Mean   :0.1354                                                               
##  3rd Qu.:0.0000                                                               
##  Max.   :6.0000                                                               
##  NA's   :6                                                                    
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Simplecaution
##  Min.   : 0.0000                                                           
##  1st Qu.: 0.0000                                                           
##  Median : 0.0000                                                           
##  Mean   : 0.4409                                                           
##  3rd Qu.: 0.0000                                                           
##  Max.   :14.0000                                                           
##  NA's   :6                                                                 
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneimprovementnotices
##  Min.   : 0.000                                                                        
##  1st Qu.: 1.000                                                                        
##  Median : 4.000                                                                        
##  Mean   : 7.527                                                                        
##  3rd Qu.: 9.000                                                                        
##  Max.   :77.000                                                                        
##  NA's   :6                                                                             
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Remedialaction&detentionnotices
##  Min.   :0.0000                                                                              
##  1st Qu.:0.0000                                                                              
##  Median :0.0000                                                                              
##  Mean   :0.3314                                                                              
##  3rd Qu.:0.0000                                                                              
##  Max.   :9.0000                                                                              
##  NA's   :6                                                                                   
##  TotalnumberofestablishmentssubjecttoWrittenwarnings
##  Min.   :  30.0                                     
##  1st Qu.: 195.0                                     
##  Median : 340.0                                     
##  Mean   : 437.0                                     
##  3rd Qu.: 543.5                                     
##  Max.   :3061.0                                     
##  NA's   :6                                          
##  Totalnumberofestablishmentssubjecttoformalenforcementactions-Prosecutionsconcluded
##  Min.   : 0.0000                                                                   
##  1st Qu.: 0.0000                                                                   
##  Median : 0.0000                                                                   
##  Mean   : 0.6657                                                                   
##  3rd Qu.: 1.0000                                                                   
##  Max.   :25.0000                                                                   
##  NA's   :6                                                                         
##  ProfessionalFullTimeEquivalentPosts-occupied *
##  Min.   : 0.65                                 
##  1st Qu.: 2.50                                 
##  Median : 3.41                                 
##  Mean   : 4.10                                 
##  3rd Qu.: 5.00                                 
##  Max.   :22.13                                 
##  NA's   :6

EDA

The Data contains the values NP and NR, respectively, which means (as per the documentation for Food Hygiene Data - Supporting Notes): No premises were supplied at this risk level(NP). No interventions are due or reported(NR). The values are type converted to a numeric datatype and replaced with the proper numerical values of 0 for NP and 100 for NR

The local authority of England contains 6 rows with empty fields that were eliminated because they had no data in any of the fields.

#Type Casting 
df$`Total%ofInterventionsachieved-premisesratedA`[df$`Total%ofInterventionsachieved-premisesratedA` == "NR"] = "100"
df$`Total%ofInterventionsachieved-premisesratedA` <- as.numeric(df$`Total%ofInterventionsachieved-premisesratedA`)
df$`Total%ofBroadlyCompliantestablishments-A`[df$`Total%ofBroadlyCompliantestablishments-A` == "NP"] = "0"
df$`Total%ofBroadlyCompliantestablishments-A` <- as.numeric(df$`Total%ofBroadlyCompliantestablishments-A`)

#Remove Null Charecters
df <- na.omit(df)

Analysis of establishments successfully respond to intervention actions

## # A tibble: 6 × 7
## # Groups:   Country [3]
##   Country          LAType                        sumA  sumB  sumC  sumD  sumE
##   <fct>            <chr>                        <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 England          District Council               403  5990 32456 73627 80226
## 2 England          London Borough                 398  3809 18087 27932 19161
## 3 England          Metropolitan Borough Council   323  3744 19142 34441 31933
## 4 England          Unitary Authority              241  3954 19246 43951 43555
## 5 Northern Ireland NI Unitary Authority            18   492  3529  6563  7799
## 6 Wales            Welsh Unitary Authority        104  1307 12546  5574 13477

England has the Highest number of Establishments and total number of establishments in each Ratings is given above.

Establisments Under Broadly Complaint

# Plot Total Establishments under Different Country

plot1 <- ggplot(df, aes(`Total%ofBroadlyCompliantestablishmentsratedA-E`))+geom_histogram(aes(fill= Country), colour="black", binwidth = 0.3)+labs(x="Total % of BroadlyCompliant Establishments Rated A-E" , title ="Distribution of Establishments under Broadly Complaint")

ggplotly(plot1)

Graph Gives Information about the Distribution of Establishments Rated Under Broadly Complaint. Most of the Establishments around 90% to 100% in each local Authority are rated according to Broadly Compliant. Establishments Under 4 Local Authority of England are 100% rated. Few Establishments under 11 Local Authority are not rated as Broadly Compliant establishments about 70-90%.

# Data of types of formal enforcement actions across different LATypes.

D3 <- df
D3$Country <- as.factor(D3$Country)
D3$LAType <- as.factor(D3$LAType)

Inve <- D3 %>%  group_by(Country,LAType)%>%
  dplyr::summarise(  Voluntaryclosure = sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Voluntaryclosure`), Seizure = sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Seizure,detention&surrenderoffood`),
 Suspension = sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Suspension/revocationofapprovalorlicence`),
 Hygieneemergencyprohibitionnotice=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneemergencyprohibitionnotice`),
 Prohibitionorder=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Prohibitionorder`),
 Simplecaution=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Simplecaution`),
 Hygieneimprovementnotices=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Hygieneimprovementnotices`),
 Remedialaction=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Remedialaction&detentionnotices`),
 Writtenwarnings=sum( `TotalnumberofestablishmentssubjecttoWrittenwarnings`),
 Prosecutionsconcluded=sum(`Totalnumberofestablishmentssubjecttoformalenforcementactions-Prosecutionsconcluded`)
  )
## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
print(Inve)
## # A tibble: 6 × 12
## # Groups:   Country [3]
##   Country     LAType Volun…¹ Seizure Suspe…² Hygie…³ Prohi…⁴ Simpl…⁵ Hygie…⁶ Remed…⁷ Writt…⁸ Prose…⁹
##   <fct>       <fct>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 England     Distr…     196      63       1      25      12      51     727      17   48305      42
## 2 England     Londo…     167      56       7     108      15      20     612      12   22475      77
## 3 England     Metro…     272      41       2      64       7      36     555      20   30279      48
## 4 England     Unita…     188     197      11      46      13      26     508       8   31370      30
## 5 Northern I… NI Un…      20      19       0       0       0       2      14       8    6747       1
## 6 Wales       Welsh…      98      40       3       4       0      18     196      50   12454      33
## # … with abbreviated variable names ¹​Voluntaryclosure, ²​Suspension,
## #   ³​Hygieneemergencyprohibitionnotice, ⁴​Prohibitionorder, ⁵​Simplecaution,
## #   ⁶​Hygieneimprovementnotices, ⁷​Remedialaction, ⁸​Writtenwarnings, ⁹​Prosecutionsconcluded

Above Table Provides Information on Total number of establishments subject to formal enforcement actions 151630 Establishments have receive Written Warnings in all Local Authorities and 24 Establisments are subjected to Suspension/Revocation of approval or licence. Establishments of Local Authority of Northern Ireland have 0 establishments subjected to Suspension or Hygiene Emergency Prohibition notice or Prohibition Order

Distribution of the percentage of successful enforcement actions across Local Authorities

#
plot2 <- ggplot(df, aes(`Total%ofInterventionsachieved(premisesratedA-E)`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 3)+labs(x="% of Intervention achieved (A-E)", y= "Interventions")

ggplotly(plot2)

The distribution is left skewed.This data is a significant negative skew, which indicates that more establishments have successfully achieved interventions.

Distributions for each establishment rated A, B, C, D, and E individually

# Plots for Different Establishments Rated A-E
dA <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedA`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
  labs(x="% of Interventions Achieved (Rated A)") +
  theme(legend.position = 'hidden')

dB <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedB`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
  labs(x="% of Interventions Achieved (Rated B)") +
  theme(legend.position = 'hidden')

dC <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedC`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
  labs(x="% of Interventions Achieved (Rated C)") +
  theme(legend.position = 'hidden')

dD <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedD`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
  labs(x="% of Interventions Achieved (Rated D)") +
  theme(legend.position = 'hidden')

dE <- ggplot(df, aes(`Total%ofInterventionsachieved-premisesratedE`))+geom_histogram(aes(fill=LAType), colour="black", binwidth = 2)+
  labs(x="% of Interventions Achieved (Rated E)") +
  theme(legend.position = 'hidden')


ggarrange(dA,dB,dC,dD,dE,common.legend = TRUE,top = "Individual Impact Level(A-E) Distribution")

Similar left skew is obtained for All premises rated (A-E) and most of the Interventions are achieved in each rated establishments.

Relationship between each local authority’s number of FTE food safety personnel and the percentage of effective answers

#Creating New data Frames with Requires data
df <- df%>%
  mutate(LAType_factored = as.numeric(factor(LAType)))

df_DC <- df %>%
  filter(LAType_factored == 1) %>%
    select(LAType, `Totalestablishments(includingnotyetrated&outside)`, `Total%ofInterventionsachieved(premisesratedA-E)`, `ProfessionalFullTimeEquivalentPosts-occupied *`) %>%
  mutate(Proportion_emp_est = `ProfessionalFullTimeEquivalentPosts-occupied *`/`Totalestablishments(includingnotyetrated&outside)`)

df_LB <- df %>%
  filter(LAType_factored == 2) %>%
    select(LAType, `Totalestablishments(includingnotyetrated&outside)`, `Total%ofInterventionsachieved(premisesratedA-E)`, `ProfessionalFullTimeEquivalentPosts-occupied *`) %>%
  mutate(Proportion_emp_est = `ProfessionalFullTimeEquivalentPosts-occupied *`/`Totalestablishments(includingnotyetrated&outside)`)

df_MBC <- df %>%
  filter(LAType_factored == 3) %>%
    select(LAType, `Totalestablishments(includingnotyetrated&outside)`, `Total%ofInterventionsachieved(premisesratedA-E)`, `ProfessionalFullTimeEquivalentPosts-occupied *`) %>%
  mutate(Proportion_emp_est = `ProfessionalFullTimeEquivalentPosts-occupied *`/`Totalestablishments(includingnotyetrated&outside)`)

df_NUA <- df %>%
  filter(LAType_factored == 4) %>%
    select(LAType, `Totalestablishments(includingnotyetrated&outside)`, `Total%ofInterventionsachieved(premisesratedA-E)`, `ProfessionalFullTimeEquivalentPosts-occupied *`) %>%
  mutate(Proportion_emp_est = `ProfessionalFullTimeEquivalentPosts-occupied *`/`Totalestablishments(includingnotyetrated&outside)`)

df_UA <- df %>%
  filter(LAType_factored == 5) %>%
    select(LAType, `Totalestablishments(includingnotyetrated&outside)`, `Total%ofInterventionsachieved(premisesratedA-E)`, `ProfessionalFullTimeEquivalentPosts-occupied *`) %>%
  mutate(Proportion_emp_est = `ProfessionalFullTimeEquivalentPosts-occupied *`/`Totalestablishments(includingnotyetrated&outside)`)


df_WUA <- df %>%
  filter(LAType_factored == 6) %>%
    select(LAType, `Totalestablishments(includingnotyetrated&outside)`, `Total%ofInterventionsachieved(premisesratedA-E)`, `ProfessionalFullTimeEquivalentPosts-occupied *`) %>%
  mutate(Proportion_emp_est = `ProfessionalFullTimeEquivalentPosts-occupied *`/`Totalestablishments(includingnotyetrated&outside)`)

Correlation Between FTE Employees and Percentages of Interventions Achieved.

x <- rcorr(as.matrix(select(df, `Total%ofInterventionsachieved(premisesratedA-E)`, `ProfessionalFullTimeEquivalentPosts-occupied *`)))

print(x)
##                                                 Total%ofInterventionsachieved(premisesratedA-E)
## Total%ofInterventionsachieved(premisesratedA-E)                                            1.00
## ProfessionalFullTimeEquivalentPosts-occupied *                                            -0.02
##                                                 ProfessionalFullTimeEquivalentPosts-occupied *
## Total%ofInterventionsachieved(premisesratedA-E)                                          -0.02
## ProfessionalFullTimeEquivalentPosts-occupied *                                            1.00
## 
## n= 347 
## 
## 
## P
##                                                 Total%ofInterventionsachieved(premisesratedA-E)
## Total%ofInterventionsachieved(premisesratedA-E)                                                
## ProfessionalFullTimeEquivalentPosts-occupied *  0.6552                                         
##                                                 ProfessionalFullTimeEquivalentPosts-occupied *
## Total%ofInterventionsachieved(premisesratedA-E) 0.6552                                        
## ProfessionalFullTimeEquivalentPosts-occupied *

The Correlation between the Quantity of Interventions and FTE employees overall is less to -0.02 and we could also see the Pvalue > 0.05 which is very high, indicating insignificance over all.

ggplot(df, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=`ProfessionalFullTimeEquivalentPosts-occupied *`)) + geom_point() + geom_smooth() + labs(y="Employees", x="Successful Interventions")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The straight line Indicates there is no big correlation between Employees and Successful Interventions.

# Linear Regression  
model_overall <- lm(`Total%ofInterventionsachieved(premisesratedA-E)`~`ProfessionalFullTimeEquivalentPosts-occupied *`, data = df)

p value = 0.6552 > 0.005 indicating insignificance, R square is 0.0005787 which shows very bad variation between dependent variable and independent variable. The coefficient estimate as intercept = 87.1091 indicating for 0 employees 87% of successful responses are obtained and Coefficient of slope = -0.1195 indicating for every employee the percentage of successful responses is reduced by 0.12 which is insignificant practically.

#output of Linear Regression
summary(model_overall)
## 
## Call:
## lm(formula = `Total%ofInterventionsachieved(premisesratedA-E)` ~ 
##     `ProfessionalFullTimeEquivalentPosts-occupied *`, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -66.304  -4.575   4.067   8.658  13.860 
## 
## Coefficients:
##                                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                                       87.1091     1.2828  67.905   <2e-16 ***
## `ProfessionalFullTimeEquivalentPosts-occupied *`  -0.1195     0.2675  -0.447    0.655    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.4 on 345 degrees of freedom
## Multiple R-squared:  0.0005787,  Adjusted R-squared:  -0.002318 
## F-statistic: 0.1998 on 1 and 345 DF,  p-value: 0.6552
cbind(coefficient=coef(model_overall), confint(model_overall))
##                                                  coefficient      2.5 %     97.5 %
## (Intercept)                                       87.1091495 84.5860343 89.6322647
## `ProfessionalFullTimeEquivalentPosts-occupied *`  -0.1195469 -0.6456029  0.4065092

The results suggest that there is an average decrease of 0.12 interventions for every 1% increase of employee . The confidence intervals include zero (95% CI = [-0.645, 0.406]) and this decrease is not significantly different from zero, t(347)= -0.47, p=0.65

Corelations Between Each Local Authority Type Employee and Sucessfull Interventions

# Creating a Correlation Data Frame
FTE_DC <- cor(df_DC$`Total%ofInterventionsachieved(premisesratedA-E)`,df_DC$`ProfessionalFullTimeEquivalentPosts-occupied *`)
FTE_LB <- cor(df_LB$`Total%ofInterventionsachieved(premisesratedA-E)`,df_LB$`ProfessionalFullTimeEquivalentPosts-occupied *`)
FTE_MBC <- cor(df_MBC$`Total%ofInterventionsachieved(premisesratedA-E)`,df_MBC$`ProfessionalFullTimeEquivalentPosts-occupied *`)
FTE_NUA <- cor(df_NUA$`Total%ofInterventionsachieved(premisesratedA-E)`,df_NUA$`ProfessionalFullTimeEquivalentPosts-occupied *`)
FTE_UA <- cor(df_UA$`Total%ofInterventionsachieved(premisesratedA-E)`,df_UA$`ProfessionalFullTimeEquivalentPosts-occupied *`)
FTE_WUA <- cor(df_WUA$`Total%ofInterventionsachieved(premisesratedA-E)`,df_WUA$`ProfessionalFullTimeEquivalentPosts-occupied *`)

print(data.frame(FTE_DC,FTE_LB,FTE_MBC,FTE_NUA,FTE_UA,FTE_WUA))
##       FTE_DC    FTE_LB   FTE_MBC   FTE_NUA      FTE_UA     FTE_WUA
## 1 0.06427808 0.1481189 0.0614555 0.1664413 -0.06561198 -0.06636445

Correlation Between Local Authority District Counsil and FTE Employees = 0.06 is a near perfect zero correlation Correlation Between Local Authority London Borough and FTE Employees = 0.1481 a positive weak correlation Correlation Between Local Authority Metropolitan Borough Council and FTE Employees = 0.06 is a near perfect zero correlation Correlation Between Local Authority Unitary Authority and FTE Employees = 0.16a positive weak correlation Correlation Between Local Authority District Counsil and FTE Employees = -0.066 is a near perfect zero correlation Correlation Between Local Authority District Counsil and FTE Employees = -0.066 is a near perfect zero correlation

# Creating a Grid of Correlation coeffecients
c1 <- ggplot(df_DC, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=`ProfessionalFullTimeEquivalentPosts-occupied *`)) + geom_point() + geom_smooth() + labs(y="Employees", x="Successful Interventions at DC")

c2 <- ggplot(df_LB, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=`ProfessionalFullTimeEquivalentPosts-occupied *`)) + geom_point() + geom_smooth() + labs(y="Employees", x="Successful Interventions at LB")

c3 <- ggplot(df_MBC, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=`ProfessionalFullTimeEquivalentPosts-occupied *`)) + geom_point() + geom_smooth() + labs(y="Employees", x="Successful Interventions at MBC")

c4 <- ggplot(df_NUA, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=`ProfessionalFullTimeEquivalentPosts-occupied *`)) + geom_point() + geom_smooth() + labs(y="Employees", x="Successful Interventions at NUA")

c5 <- ggplot(df_UA, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=`ProfessionalFullTimeEquivalentPosts-occupied *`)) + geom_point() + geom_smooth() + labs(y="Employees", x="Successful Interventions at UA")

c6 <- ggplot(df_WUA, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=`ProfessionalFullTimeEquivalentPosts-occupied *`)) + geom_point() + geom_smooth() + labs(y="Employees", x="Successful Interventions at WUA")

grid.arrange(c1,c2,c3,c4,c5,c6,nrow=3,top = "Employee Relation at Each Local Authority")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'


Therefore there is a near perfect zero correlation between employees and Local Authorities DC,MBC, UA,WUA. Although weak, there is a positive association at the Local Authority LB and NUA..

# Creating a Linear Regression Model for Every establishments of Loacal Authority Type
model_DC <- lm(`Total%ofInterventionsachieved(premisesratedA-E)`~`ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_DC)

model_LB <- lm(`Total%ofInterventionsachieved(premisesratedA-E)`~`ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_LB)

model_MBC <- lm(`Total%ofInterventionsachieved(premisesratedA-E)`~`ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_MBC)

model_NUA <- lm(`Total%ofInterventionsachieved(premisesratedA-E)`~`ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_NUA)

model_UA <- lm(`Total%ofInterventionsachieved(premisesratedA-E)`~`ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_UA)

model_WUA <- lm(`Total%ofInterventionsachieved(premisesratedA-E)`~`ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_WUA)
#Linear Model Output of DC
print(summary(model_DC))
## 
## Call:
## lm(formula = `Total%ofInterventionsachieved(premisesratedA-E)` ~ 
##     `ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_DC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.451  -4.349   3.672   7.447  12.742 
## 
## Coefficients:
##                                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                                       86.3480     2.1678  39.831   <2e-16 ***
## `ProfessionalFullTimeEquivalentPosts-occupied *`   0.6064     0.6884   0.881     0.38    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.7 on 187 degrees of freedom
## Multiple R-squared:  0.004132,   Adjusted R-squared:  -0.001194 
## F-statistic: 0.7758 on 1 and 187 DF,  p-value: 0.3796
cbind(coefficient=coef(model_DC), confint(model_DC))
##                                                  coefficient      2.5 %    97.5 %
## (Intercept)                                       86.3480220 82.0714604 90.624584
## `ProfessionalFullTimeEquivalentPosts-occupied *`   0.6063621 -0.7516923  1.964417

There is an average increase of 0.60 interventions for every 1% increase of employee . The confidence intervals include zero (95% CI = [-0.75, 1.96]) and this increase is not significant, t(187)= 0.88, p=0.38

#Linear Model Output of MBC
print(summary(model_LB))
## 
## Call:
## lm(formula = `Total%ofInterventionsachieved(premisesratedA-E)` ~ 
##     `ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_LB)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.285  -6.157   6.197  12.916  16.682 
## 
## Coefficients:
##                                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                                        77.617      6.808  11.400 1.28e-12 ***
## `ProfessionalFullTimeEquivalentPosts-occupied *`    0.948      1.137   0.834    0.411    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.15 on 31 degrees of freedom
## Multiple R-squared:  0.02194,    Adjusted R-squared:  -0.009611 
## F-statistic: 0.6954 on 1 and 31 DF,  p-value: 0.4107
cbind(coefficient=coef(model_LB), confint(model_LB))
##                                                  coefficient     2.5 %   97.5 %
## (Intercept)                                       77.6166429 63.730847 91.50244
## `ProfessionalFullTimeEquivalentPosts-occupied *`   0.9480366 -1.370657  3.26673

There is an average increase of 0.94 interventions for every 1% increase of employee . The confidence intervals include zero (95% CI = [-1.37, 3.27]) and this increase is not significant, t(31)= 0.83, p=0.41

#Linear Model Output of MBC
print(summary(model_MBC))
## 
## Call:
## lm(formula = `Total%ofInterventionsachieved(premisesratedA-E)` ~ 
##     `ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_MBC)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.653  -4.841   4.143   8.862  14.009 
## 
## Coefficients:
##                                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                                       83.2184     5.1618  16.122   <2e-16 ***
## `ProfessionalFullTimeEquivalentPosts-occupied *`   0.2937     0.8180   0.359    0.722    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.4 on 34 degrees of freedom
## Multiple R-squared:  0.003777,   Adjusted R-squared:  -0.02552 
## F-statistic: 0.1289 on 1 and 34 DF,  p-value: 0.7218
cbind(coefficient=coef(model_MBC), confint(model_MBC))
##                                                  coefficient     2.5 %    97.5 %
## (Intercept)                                        83.218439 72.728299 93.708580
## `ProfessionalFullTimeEquivalentPosts-occupied *`    0.293698 -1.368777  1.956173

There is an average increase of 0.29 interventions for every 1% increase of employee . The confidence intervals include zero (95% CI = [-1.37, 1.96]) and this increase is not significant, t(34)= 0.35, p=0.72

#Linear Model Output of NUA
print(summary(model_NUA))
## 
## Call:
## lm(formula = `Total%ofInterventionsachieved(premisesratedA-E)` ~ 
##     `ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_NUA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.698  -3.566   4.798   8.699  10.435 
## 
## Coefficients:
##                                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                                       84.0288     9.8090   8.566 1.28e-05 ***
## `ProfessionalFullTimeEquivalentPosts-occupied *`   0.8379     1.6546   0.506    0.625    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.97 on 9 degrees of freedom
## Multiple R-squared:  0.0277, Adjusted R-squared:  -0.08033 
## F-statistic: 0.2564 on 1 and 9 DF,  p-value: 0.6248
cbind(coefficient=coef(model_NUA), confint(model_NUA))
##                                                  coefficient     2.5 %     97.5 %
## (Intercept)                                       84.0288130 61.839237 106.218389
## `ProfessionalFullTimeEquivalentPosts-occupied *`   0.8378766 -2.905125   4.580878

There is an average increase of 0.83 interventions for every 1% increase of employee . The confidence intervals include zero (95% CI = [-2.9, 4.58]) and this increase is not significant, t(9)= 0.5, p=0.62

#Linear Model Output of UA
print(summary(model_UA))
## 
## Call:
## lm(formula = `Total%ofInterventionsachieved(premisesratedA-E)` ~ 
##     `ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_UA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.801  -6.182   4.241  10.188  16.303 
## 
## Coefficients:
##                                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                                       84.8356     3.5571  23.850   <2e-16 ***
## `ProfessionalFullTimeEquivalentPosts-occupied *`  -0.2957     0.6120  -0.483    0.631    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.18 on 54 degrees of freedom
## Multiple R-squared:  0.004305,   Adjusted R-squared:  -0.01413 
## F-statistic: 0.2335 on 1 and 54 DF,  p-value: 0.6309
cbind(coefficient=coef(model_UA), confint(model_UA))
##                                                  coefficient     2.5 %     97.5 %
## (Intercept)                                       84.8355578 77.704008 91.9671072
## `ProfessionalFullTimeEquivalentPosts-occupied *`  -0.2957065 -1.522672  0.9312588

There is an average decrease of -0.30 interventions for every 1% increase of employee . The confidence intervals include zero (95% CI = [-1.52, 0.93]) and this decrease is not significant, t(54)= -0.48, p=0.63

#Linear Model Output of WUA
print(summary(model_WUA))
## 
## Call:
## lm(formula = `Total%ofInterventionsachieved(premisesratedA-E)` ~ 
##     `ProfessionalFullTimeEquivalentPosts-occupied *`, data = df_WUA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.605  -1.966   3.430   7.623  10.771 
## 
## Coefficients:
##                                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                                       91.0151     5.2247  17.420 1.48e-13 ***
## `ProfessionalFullTimeEquivalentPosts-occupied *`  -0.2093     0.7036  -0.297    0.769    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.3 on 20 degrees of freedom
## Multiple R-squared:  0.004404,   Adjusted R-squared:  -0.04538 
## F-statistic: 0.08847 on 1 and 20 DF,  p-value: 0.7692
cbind(coefficient=coef(model_WUA), confint(model_WUA))
##                                                  coefficient     2.5 %     97.5 %
## (Intercept)                                       91.0151486 80.116665 101.913632
## `ProfessionalFullTimeEquivalentPosts-occupied *`  -0.2092914 -1.677032   1.258449

There is an average decrease of -0.20 interventions for every 1% increase of employee . The confidence intervals include zero (95% CI = [-1.68, 1.26]) and this decrease is not significant, t(20)= -0.297, p=0.77

Relaitonship between Proportion of employees in local Authority and Total % of Interventions

# Creating a Correlation Table for Proportion of employees to Successful Interventions
FTE_PDC <- cor(df_DC$`Total%ofInterventionsachieved(premisesratedA-E)`,df_DC$Proportion_emp_est)
FTE_PLB <- cor(df_LB$`Total%ofInterventionsachieved(premisesratedA-E)`,df_LB$Proportion_emp_est)
FTE_PMBC <- cor(df_MBC$`Total%ofInterventionsachieved(premisesratedA-E)`,df_MBC$Proportion_emp_est)
FTE_PNUA <- cor(df_NUA$`Total%ofInterventionsachieved(premisesratedA-E)`,df_NUA$Proportion_emp_est)
FTE_PUA <- cor(df_UA$`Total%ofInterventionsachieved(premisesratedA-E)`,df_UA$Proportion_emp_est)
FTE_PWUA <- cor(df_WUA$`Total%ofInterventionsachieved(premisesratedA-E)`,df_WUA$Proportion_emp_est)

print(data.frame(FTE_PDC,FTE_PLB,FTE_PMBC,FTE_PNUA,FTE_PUA,FTE_PWUA))
##    FTE_PDC   FTE_PLB FTE_PMBC  FTE_PNUA   FTE_PUA   FTE_PWUA
## 1 0.149695 0.3152731  0.23219 0.4624527 0.2594489 0.01068116

Correlation Between Local Authority District Counsil and Proportion of FTE Employees = 0.15 a positive weak correlation Correlation Between Local Authority London Borough and Proportion of FTE Employees = 0.32 a positive correlation Correlation Between Local Authority Metropolitan Borough Council and Proportion of FTE Employees = 0.23 a positive weak correlation Correlation Between Local Authority Unitary Authority and Proportion of FTE Employees = 0.46 a positive correlation Correlation Between Local Authority District Counsil and Proportion of FTE Employees = 0.23 a positive weak correlation Correlation Between Local Authority District Counsil and Proportion of FTE Employees = 0.010 a positive weak correlation

#Grid for Correlation for Proportion of employes.
c7 <- ggplot(df_DC, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=Proportion_emp_est)) + geom_point() + geom_smooth() + labs(y="Proportion", x="Successful Interventions DC")

c8 <- ggplot(df_LB, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=Proportion_emp_est)) + geom_point() + geom_smooth() + labs(y="Proportion", x="Successful Interventions LB")

c9 <- ggplot(df_MBC, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=Proportion_emp_est)) + geom_point() + geom_smooth() + labs(y="Proportion", x="Successful Interventions MBC")

c10 <- ggplot(df_NUA, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=Proportion_emp_est)) + geom_point() + geom_smooth() + labs(y="Proportion", x="Successful Interventions NUA")

c11 <- ggplot(df_UA, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=Proportion_emp_est)) + geom_point() + geom_smooth() + labs(y="Proportion", x="Successful Interventions UA")

c12 <- ggplot(df_WUA, aes(x=`Total%ofInterventionsachieved(premisesratedA-E)`, y=Proportion_emp_est)) + geom_point() + geom_smooth() + labs(y="Proportion", x="Successful Interventions WUA")

grid.arrange(c7,c8,c9,c10,c11,c12,nrow=3,top = "Proportion of Employee at Each Local Authority")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

The Graph depicts that there is a weak positive correlation between propotion of employees and Local Authorities in all LATypes DC, LB, MBC, NUA, UA,WUA.

Section 2

This report was created to aid politicians and management of the Food Standards Agency in understanding how businesses in different local governments in the United Kingdom responded to intervention measures put on them. The collection initially contains 353 establishments from three distinct nations (England, Wales, and Northern Ireland). There are six separate local authority classifications in these three nations, including Welsh Unitary Authority, NI Unitary Authority, London Borough, Metropolitan Borough Council, and District Council (DC) (WUA). 38 information columns about each establishment were also included. Six rows with NA values were eliminated from the data after cleaning, leaving 347 businesses for study. The distribution of the percentage of successful enforcement actions across Local

Table of total establishments in different Local Authority Type in England, Wales and Northern Ireland

## # A tibble: 6 × 7
## # Groups:   Country [3]
##   Country          LAType                        sumA  sumB  sumC  sumD  sumE
##   <fct>            <chr>                        <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 England          District Council               403  5990 32456 73627 80226
## 2 England          London Borough                 398  3809 18087 27932 19161
## 3 England          Metropolitan Borough Council   323  3744 19142 34441 31933
## 4 England          Unitary Authority              241  3954 19246 43951 43555
## 5 Northern Ireland NI Unitary Authority            18   492  3529  6563  7799
## 6 Wales            Welsh Unitary Authority        104  1307 12546  5574 13477

In England, we can see that there are more establishments that fall under four different local authority types.The Number of Establishments Rated betweeen A-E in each Local Authority type is also depicted.

Graph Gives Information about the Distribution of Establishments Rated Under Broadly Complaint Most of the Establishments in each local Authority around 90% to 100% are rated according to Broadly Compliant. Establishments Under 4 Local Authority(Forest of Dean,West Dorset (1),Hartlepool (10),- Isles of Scilly) of England are 100% rated. Few Establishments under 11 Local Authority of England are not rated as Broadly Compliant establishments about 70-90%.

Table of Formal Enforcement action taken on Establishments Under Each Local Authority

## # A tibble: 6 × 12
## # Groups:   Country [3]
##   Country     LAType Volun…¹ Seizure Suspe…² Hygie…³ Prohi…⁴ Simpl…⁵ Hygie…⁶ Remed…⁷ Writt…⁸ Prose…⁹
##   <fct>       <fct>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 England     Distr…     196      63       1      25      12      51     727      17   48305      42
## 2 England     Londo…     167      56       7     108      15      20     612      12   22475      77
## 3 England     Metro…     272      41       2      64       7      36     555      20   30279      48
## 4 England     Unita…     188     197      11      46      13      26     508       8   31370      30
## 5 Northern I… NI Un…      20      19       0       0       0       2      14       8    6747       1
## 6 Wales       Welsh…      98      40       3       4       0      18     196      50   12454      33
## # … with abbreviated variable names ¹​Voluntaryclosure, ²​Suspension,
## #   ³​Hygieneemergencyprohibitionnotice, ⁴​Prohibitionorder, ⁵​Simplecaution,
## #   ⁶​Hygieneimprovementnotices, ⁷​Remedialaction, ⁸​Writtenwarnings, ⁹​Prosecutionsconcluded

Above Table Provides Information on Total number of establishments subject to formal enforcement actions. 151630 Establishments have recived Written Warnings in all Local Authorities and 24 Establisments are subjected to Suspension/Revocation of approvalor licence. Establishments of Local Authority of Northern Ireland have 0 establishments subjected to Suspension or Hygine Emergency Prohibition notice or Prohibition Order Establiments in Northern Ireland have less enforcement actions over all.

Distribution of the percentage of successful enforcement actions across Local Authorities

The distribution of the percentage of establishments (all impact levels combined) that successfully achieved the enforcement actions is shown in Figure below,

Most of the Local Authorities have achieved 80-100% of successful Intervention in most of the Establishments. 1 local authority under the LAType London Borough which has only 20% interventions achieved. 1 local authority under the LAType Unitary Authority has achieved only 36% of interventions.

It is evident that a bigger percentage of establishments successfully carry out law enforcement operations. By implementing the required changes and resolving the issues raised, several enterprises are so becoming compliant with the legislation. Overall, the establishments have responded very favorably to the actions taken by law enforcement.

Distributions for each establishment rated A, B, C, D, and E individually

The Distribution of Individual Impact on successful interventions is shown in the figure below.

All the Graphs indicate most of the Intervention are achieved in all Ranked in different LATypes. More than 300 Establishments Rated A and achieved 90% of their Interventions including all LA Types. There are more Establishments Rated E which have achieved less than 75% of interventions compared to other Rated establishments. Most Establishments Under all LATypes Rated A-D have achieved more than 75% of the Interventions.

Relationship between each local authority’s number of FTE food safety personnel and the percentage of effective answers

##                                                 Total%ofInterventionsachieved(premisesratedA-E)
## Total%ofInterventionsachieved(premisesratedA-E)                                            1.00
## ProfessionalFullTimeEquivalentPosts-occupied *                                            -0.02
##                                                 ProfessionalFullTimeEquivalentPosts-occupied *
## Total%ofInterventionsachieved(premisesratedA-E)                                          -0.02
## ProfessionalFullTimeEquivalentPosts-occupied *                                            1.00
## 
## n= 347 
## 
## 
## P
##                                                 Total%ofInterventionsachieved(premisesratedA-E)
## Total%ofInterventionsachieved(premisesratedA-E)                                                
## ProfessionalFullTimeEquivalentPosts-occupied *  0.6552                                         
##                                                 ProfessionalFullTimeEquivalentPosts-occupied *
## Total%ofInterventionsachieved(premisesratedA-E) 0.6552                                        
## ProfessionalFullTimeEquivalentPosts-occupied *

Total percentage of intervention accomplished, and professional full-time equivalents positions have a correlation(r) of about 0(-0.02), indicating that there is very little adverse relationship between them which can be stated as negative correlation. Additionally, they have a relatively high of 0.6552 p-value (p>0.05), which denotes lower relevance among st them.

The Below Graph Relationship between Successful Interventions and FTE Employees at each local Authority

The Pvalue for all models under each local authority > 0.05 indicating insignificance. Thus there is no big relation between Number of Employees and Interventions successfully achieved.

Relationship between each local authority’s Proportion of FTE food safety personnel and the percentage of effective answers

Between proportion of successful responses and the number of FTE food safety employees, the correlation is seen as 0.064278, which is approximately zero and indicates that they are extremely less and might be dependent directly forming a positive correlation. Comparatively, London Borough (0.14811), the Metropolitan Borough Council (0.0614555), and the Unitary Authority (0.16641), results are very less dependent directly and depicts a positive correlation, nevertheless, the Northern Ireland Unitary Authority (-0.06561), and the Welsh Unitary Authority (-0.0663) are significantly less dependent inversely stating a negative correlation.

The overall success rate of interventions and Proportion of Employee correlation(r) gives out as 0.21, indicating a positive relation between the two. Furthermore, the low p-value of 1e-04, demonstrates a high level of significance among the two variables. Using the given model’s findings, it is indicated that the coefficient of 2407.398 anticipates a considerable rise of 24.073 in the overall proportion of interventions that were successful for every 0.01 increase in Proportion of Employee However, 79.536% (according to the 79.536 intercept) informs us that the total intervention will be successful in a condition with no Proportion of Employee

The calculated overall percent of interventions accomplished exceeds the projected 95% Confidence interval of [1193.72035 - 3621.0758] by 2407.398.

Lastly, it is possible to estimate how establishments will react to enforcement actions by looking at the Proportion of food safety employees within each establishment. But then, the ability of the businesses to successfully execute interventions is not directly impacted by food safety employees.

Scenario 2- Publisher Statistics

This is a report for a manager of a publishing company with specific analysis.

The data provided contains information on e-book sales over a period of many months. Each row in the data represents one book. The values of the variables are taken across the entire time period, so daily.sales is the average number of sales (minus refunds) across all days in the period, and sale.price is the average price for which the book sold across all sales in the period.

Section 1

Data Dictionary

Variable Name Description
Genre The book’s Genre
avg.review Average Reviews of E-Books.
daily.sales average number of sales across all days in the period.
total.reviews The total amount of ebook reviews.
sale.price average price for which the book sold across all sales in the period.

Data

bd <- read_csv("publisher_sales.csv")
## Rows: 6000 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): sold by, publisher.type, genre
## dbl (4): avg.review, daily.sales, total.reviews, sale.price
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

EDA

str(bd)
## spc_tbl_ [6,000 × 7] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ sold by       : chr [1:6000] "Random House LLC" "Amazon Digital Services,  Inc." "Amazon Digital Services,  Inc." "Amazon Digital Services,  Inc." ...
##  $ publisher.type: chr [1:6000] "big five" "indie" "small/medium" "small/medium" ...
##  $ genre         : chr [1:6000] "childrens" "non_fiction" "non_fiction" "fiction" ...
##  $ avg.review    : num [1:6000] 4.44 4.19 3.71 4.72 4.65 4.81 4.33 4.21 3.95 4.66 ...
##  $ daily.sales   : num [1:6000] 61.5 74.9 66 85.2 37.7 ...
##  $ total.reviews : num [1:6000] 92 130 118 179 111 106 205 86 161 81 ...
##  $ sale.price    : num [1:6000] 8.03 9.08 9.48 12.32 5.78 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `sold by` = col_character(),
##   ..   publisher.type = col_character(),
##   ..   genre = col_character(),
##   ..   avg.review = col_double(),
##   ..   daily.sales = col_double(),
##   ..   total.reviews = col_double(),
##   ..   sale.price = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
#Genre should be of factor datatype instead of character
bd$genre <- as.factor(bd$genre)
levels(bd$genre)
## [1] "childrens"   "fiction"     "non_fiction"
summary(bd)
##    sold by          publisher.type             genre        avg.review     daily.sales    
##  Length:6000        Length:6000        childrens  :2000   Min.   :0.000   Min.   : -0.53  
##  Class :character   Class :character   fiction    :2000   1st Qu.:4.100   1st Qu.: 56.77  
##  Mode  :character   Mode  :character   non_fiction:2000   Median :4.400   Median : 74.29  
##                                                           Mean   :4.267   Mean   : 79.11  
##                                                           3rd Qu.:4.620   3rd Qu.: 98.02  
##                                                           Max.   :4.980   Max.   :207.98  
##  total.reviews     sale.price    
##  Min.   :  0.0   Min.   : 0.740  
##  1st Qu.:105.0   1st Qu.: 7.140  
##  Median :128.0   Median : 8.630  
##  Mean   :132.6   Mean   : 8.641  
##  3rd Qu.:163.0   3rd Qu.:10.160  
##  Max.   :248.0   Max.   :17.460

We can see Negaive values in daily sales average which is not possible thus we can remove the values less than zero. Total and Average Reviews there are few books with 0 reviews.

#Omitting negative values
bd <- bd %>% filter(daily.sales > 0)
summary(bd)
##    sold by          publisher.type             genre        avg.review     daily.sales    
##  Length:5999        Length:5999        childrens  :2000   Min.   :0.000   Min.   :  3.49  
##  Class :character   Class :character   fiction    :2000   1st Qu.:4.100   1st Qu.: 56.78  
##  Mode  :character   Mode  :character   non_fiction:1999   Median :4.400   Median : 74.30  
##                                                           Mean   :4.267   Mean   : 79.12  
##                                                           3rd Qu.:4.620   3rd Qu.: 98.02  
##                                                           Max.   :4.980   Max.   :207.98  
##  total.reviews     sale.price    
##  Min.   :  0.0   Min.   : 0.740  
##  1st Qu.:105.0   1st Qu.: 7.140  
##  Median :128.0   Median : 8.630  
##  Mean   :132.6   Mean   : 8.641  
##  3rd Qu.:163.0   3rd Qu.:10.160  
##  Max.   :248.0   Max.   :17.460
# Histogram for the sales of the books 

ggplot(data = bd, aes(x = sale.price)) + geom_histogram(binwidth = 0.2)

# Average reviews for different sorts of publishers are displayed.

ggplot(data = bd, aes(x = avg.review, fill = publisher.type,alpha = 0.1)) + geom_histogram(binwidth = 0.1,position = 'dodge')

Comparatively, the average review for books sold by small- and medium-sized publishers is excellent. Additionally, several of the reviews for the plot were zero.

Daily sales of ebooks, broken down by genre

Sales.Genre <- bd %>% group_by(genre) %>%
  summarise(average.sales = mean(daily.sales))


ggplot(Sales.Genre, aes(x=genre, y=average.sales)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = average.sales), vjust = -0.2) +
  labs(x="Genre of Books", y="Avg Sales", title = "Average e-Book Sales for each Genre ")

Fiction has the Highest Average Daily Sales among different Genre.

The daily sales varies depending on the Genre of books. Fiction Genre books are sold the most follwed by Non_fiction and Children respectively.

Sales based on average review scores and overall review count

rcorr(as.matrix(bd %>% select(avg.review, daily.sales, total.reviews, sale.price)))
##               avg.review daily.sales total.reviews sale.price
## avg.review          1.00       -0.01          0.10      -0.02
## daily.sales        -0.01        1.00          0.66      -0.28
## total.reviews       0.10        0.66          1.00      -0.26
## sale.price         -0.02       -0.28         -0.26       1.00
## 
## n= 5999 
## 
## 
## P
##               avg.review daily.sales total.reviews sale.price
## avg.review               0.6862      0.0000        0.2450    
## daily.sales   0.6862                 0.0000        0.0000    
## total.reviews 0.0000     0.0000                    0.0000    
## sale.price    0.2450     0.0000      0.0000
grid.arrange(
ggplot(bd, aes(x=daily.sales, y=avg.review)) + geom_point() + geom_smooth() + labs(y="Average Review", x="Daily Sales"),

ggplot(bd, aes(x=daily.sales, y=total.reviews)) + geom_point() + geom_smooth() + labs(y="Total Review", x="Daily Sales"),

ggplot(bd, aes(x=avg.review, y=total.reviews)) + geom_point() + geom_smooth() + labs(y="Total Review", x="Average Review"),

nrow=3, top="Necessary Correlations")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

There is no Big correlation between Avg reviews and Daily Sales as well as Average Review and Total Review. Therefore Average Review and Total Review are both independent variables of each other. There also Exists a positive correlation between Total Review and Daily Sales.

# Creating a correlation data frame
cor_avd <- cor(bd$avg.review,bd$daily.sales)
cor_tvd <- cor(bd$total.reviews,bd$daily.sales)
cor_avt <- cor(bd$total.reviews,bd$avg.review)

print(data.frame(cor_avd,cor_tvd,cor_avt))
##       cor_avd   cor_tvd   cor_avt
## 1 -0.00521738 0.6638385 0.1044134

Correlation Between Total Review and Daily sales = 0.66 which is a positive correlation.

#Performing Multiple Linear Regression to predict the daily sales based on average and total reviews
m1 <- lm(daily.sales ~ avg.review + total.reviews, data = bd)
summary(m1)
## 
## Call:
## lm(formula = daily.sales ~ avg.review + total.reviews, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -103.407  -14.656   -1.071   13.672  122.177 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   24.123430   2.340120  10.309  < 2e-16 ***
## avg.review    -3.999637   0.512874  -7.798 7.34e-15 ***
## total.reviews  0.543327   0.007816  69.517  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.58 on 5996 degrees of freedom
## Multiple R-squared:  0.4463, Adjusted R-squared:  0.4461 
## F-statistic:  2416 on 2 and 5996 DF,  p-value: < 2.2e-16
cbind(coef(m1),confint(m1))
##                              2.5 %     97.5 %
## (Intercept)   24.123430 19.5359532 28.7109077
## avg.review    -3.999637 -5.0050539 -2.9942200
## total.reviews  0.543327  0.5280054  0.5586487

When estimating the effect of both Total Review and Average Review in the same regression we find that when controlling for other variables, a 1 unit increase in total review predicts 0.543 additional sales (t(5996) = 69.51, p<0.001, 95% CI [0.53, 0.56]) and an increase in average review by 1 unit predicts a decrease in daily sales of 3.99 (t(5996) = -7.8, p<0.001, 95% CI [-5, -2.99])

Since we have zero avg and total reviews in the data we will form another model without those.

#Performing Multiple Linear Regression to predict the daily sales based on average and total reviews with removing 0 reviews
bd_null <- bd %>% filter(total.reviews != 0)

m2 <- lm(daily.sales ~ avg.review + total.reviews, data = bd_null)
summary(m2)
## 
## Call:
## lm(formula = daily.sales ~ avg.review + total.reviews, data = bd_null)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -104.341  -14.628   -0.752   13.829   93.489 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    5.476624   2.650690   2.066   0.0389 *  
## avg.review    -0.366436   0.565700  -0.648   0.5172    
## total.reviews  0.564835   0.007831  72.127   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.2 on 5973 degrees of freedom
## Multiple R-squared:  0.4655, Adjusted R-squared:  0.4653 
## F-statistic:  2601 on 2 and 5973 DF,  p-value: < 2.2e-16
cbind(coef(m2),confint(m2))
##                               2.5 %     97.5 %
## (Intercept)    5.4766239  0.2803136 10.6729342
## avg.review    -0.3664363 -1.4754132  0.7425405
## total.reviews  0.5648349  0.5494831  0.5801868

The rate of daily sales decreases by 0.366 between -1.47 and 0.742 when the average review increases by 1% Daily sales increases by 0.56, 95% CI [0.54, 0.58] for every total review

Effect of sale price upon the number of sales, and across different genres

#Correlation between number of sales and sale price
rcorr(as.matrix(select(bd, sale.price, daily.sales)))
##             sale.price daily.sales
## sale.price        1.00       -0.28
## daily.sales      -0.28        1.00
## 
## n= 5999 
## 
## 
## P
##             sale.price daily.sales
## sale.price              0         
## daily.sales  0
ggplot(data = bd, aes(x = daily.sales, y = sale.price)) + geom_point() + geom_smooth(method = lm) + labs(x= "Daily Sales", y = "Sales Price",title = expression(r == -0.28))
## `geom_smooth()` using formula = 'y ~ x'

There is a weak negative correlation with rvalue = -0.28 between daily sales and sales price which is significant p<0.05

# Sales Price VS Daily Sales Graph 
ggplot(data = bd, aes(x = sale.price, y = daily.sales, color = genre)) + geom_point(alpha = 0.1) + geom_smooth(method = lm)
## `geom_smooth()` using formula = 'y ~ x'

#Different genres data frame
c <- bd %>% filter(genre == 'childrens')
f <- bd %>% filter(genre == 'fiction')
nf <- bd %>% filter(genre == 'non_fiction')

k1 <- ggplot(data = c, aes(x = daily.sales, y = sale.price)) + geom_point() + geom_smooth(method = lm) + labs(x= "Daily Sales", y = "Sales Price", title ='Children')

k2 <- ggplot(data =f, aes(x = daily.sales, y = sale.price)) + geom_point() + geom_smooth(method = lm) + labs(x= "Daily Sales", y = "Sales Price",title = "Fiction")

k3 <- ggplot(data = nf, aes(x = daily.sales, y = sale.price)) + geom_point() + geom_smooth(method = lm) + labs(x= "Daily Sales", y = "Sales Price",title="Non-Fiction")

grid.arrange(k1,k2,k3, ncol=3)
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

There exists a Negative correlation between genre Children and daily sales. No strong correlation exists for the other genres.

#Simple model with just daily.sales as a function of sale.price
ds_sp <- lm(daily.sales ~ sale.price, data = bd)
#Output of Model
print(summary(ds_sp))
## 
## Call:
## lm(formula = daily.sales ~ sale.price, data = bd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -80.760 -20.644  -4.638  17.084 130.301 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 112.0540     1.5201   73.72   <2e-16 ***
## sale.price   -3.8110     0.1704  -22.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.15 on 5997 degrees of freedom
## Multiple R-squared:  0.07696,    Adjusted R-squared:  0.0768 
## F-statistic:   500 on 1 and 5997 DF,  p-value: < 2.2e-16
print(cbind(coef(ds_sp),confint(ds_sp)))
##                             2.5 %     97.5 %
## (Intercept) 112.054023 109.074077 115.033968
## sale.price   -3.810984  -4.145101  -3.476867
print(( ds_sp_emm <- emmeans(ds_sp, ~sale.price) ))
##  sale.price emmean    SE   df lower.CL upper.CL
##        8.64   79.1 0.376 5997     78.4     79.9
## 
## Confidence level used: 0.95

The estimate value for sale.price is negative, so the daily sales will decrease by 3.81 when sale price increases by 1 percent.

#Model including genre
g1 <- lm(daily.sales ~ sale.price + genre, data = bd)
( g1.emm <- emmeans(g1, ~sale.price) )
##  sale.price emmean    SE   df lower.CL upper.CL
##        8.64   79.1 0.286 5995     78.6     79.7
## 
## Results are averaged over the levels of: genre 
## Confidence level used: 0.95
#ANOVA method to compare models

anova(ds_sp, g1)
## Analysis of Variance Table
## 
## Model 1: daily.sales ~ sale.price
## Model 2: daily.sales ~ sale.price + genre
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1   5997 5097347                                  
## 2   5995 2944127  2   2153220 2192.3 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Interaction Terms
int <- lm(daily.sales ~ sale.price * genre, data = bd)
summary(int)
## 
## Call:
## lm(formula = daily.sales ~ sale.price * genre, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -102.383  -13.374    0.018   13.042  102.366 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  72.8781     2.5003  29.147  < 2e-16 ***
## sale.price                   -1.7319     0.2453  -7.059 1.87e-12 ***
## genrefiction                 35.1993     3.2711  10.761  < 2e-16 ***
## genrenon_fiction              6.3974     3.2015   1.998 0.045736 *  
## sale.price:genrefiction       1.4587     0.3543   4.118 3.88e-05 ***
## sale.price:genrenon_fiction   1.3057     0.3467   3.766 0.000167 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.13 on 5993 degrees of freedom
## Multiple R-squared:  0.4687, Adjusted R-squared:  0.4683 
## F-statistic:  1057 on 5 and 5993 DF,  p-value: < 2.2e-16
anova(ds_sp, g1,int)
## Analysis of Variance Table
## 
## Model 1: daily.sales ~ sale.price
## Model 2: daily.sales ~ sale.price + genre
## Model 3: daily.sales ~ sale.price * genre
##   Res.Df     RSS Df Sum of Sq        F    Pr(>F)    
## 1   5997 5097347                                    
## 2   5995 2944127  2   2153220 2199.194 < 2.2e-16 ***
## 3   5993 2933858  2     10269   10.489 2.836e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

When forecasting daily.sales, there is a substantial positive interaction between sale.price and genre, as indicated by the sale.price:genre beta coefficient. This implies that the slope of genre is steeper when the value of the sale.price is higher.

Section 2

The purpose of this report is to help the publishing company’s managers comprehend the progress in e-book sales. The initial set of data included 6000 e-books and seven variables that tracked e-book sales over time.

1)Do books from different genres have different daily sales on average?

E-books from all three major categories— children’s, fiction, and nonfiction—are included in the data. Below are the average sales for each of these genres:

## # A tibble: 3 × 2
##   genre       average.sales
##   <fct>               <dbl>
## 1 childrens            55.6
## 2 fiction             106. 
## 3 non_fiction          75.9

The average sales for all books during this time period are 79.1, and the three categories significantly deviate from this value.

The sales of different Genre books have no much similarity on the daily sales. The Fiction(105.88) collection as more daily sales when compared to others followed by Non-fiction(75.9) and Children(55.77) respectively.

2)Do books have more/fewer sales depending upon their average review scores and total number of reviews?

The Graph below shows the correlation between Average Reviews, Total Reviews and Daily Sales

There is Small positive correlation of daily sales with Total Review. But there is no big correlation between the other variables. This indicates the increase in total review increases the daily sales.

Thus for further relation considering the multiple linera regression model

## 
## Call:
## lm(formula = daily.sales ~ avg.review + total.reviews, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -103.407  -14.656   -1.071   13.672  122.177 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   24.123430   2.340120  10.309  < 2e-16 ***
## avg.review    -3.999637   0.512874  -7.798 7.34e-15 ***
## total.reviews  0.543327   0.007816  69.517  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.58 on 5996 degrees of freedom
## Multiple R-squared:  0.4463, Adjusted R-squared:  0.4461 
## F-statistic:  2416 on 2 and 5996 DF,  p-value: < 2.2e-16
##                              2.5 %     97.5 %
## (Intercept)   24.123430 19.5359532 28.7109077
## avg.review    -3.999637 -5.0050539 -2.9942200
## total.reviews  0.543327  0.5280054  0.5586487

This depicts that more books are sold depending upon the Total reviews (i.e) When the count of total review increase by 1 there is 0.54percent increase in the daily sales. Where as the Average review decreases the percentage of daily sales.

3)What is the effect of sale price upon the number of sales, and is this different across genres?

The Correlation Graph between Daily sales and Sales price are as given below

The weak negative correlation is seen indicating when there is a decrease in daily sales there increases the sales price. Further we have made the analysis with the help of Linear regression and estimation to be more particular

## 
## Call:
## lm(formula = daily.sales ~ sale.price, data = bd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -80.760 -20.644  -4.638  17.084 130.301 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 112.0540     1.5201   73.72   <2e-16 ***
## sale.price   -3.8110     0.1704  -22.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.15 on 5997 degrees of freedom
## Multiple R-squared:  0.07696,    Adjusted R-squared:  0.0768 
## F-statistic:   500 on 1 and 5997 DF,  p-value: < 2.2e-16
##                             2.5 %     97.5 %
## (Intercept) 112.054023 109.074077 115.033968
## sale.price   -3.810984  -4.145101  -3.476867
##  sale.price emmean    SE   df lower.CL upper.CL
##        8.64   79.1 0.376 5997     78.4     79.9
## 
## Confidence level used: 0.95

Since the estimate value for sale.price is negative, a 1% increase in selling price will result in a 3.81 loss in daily sales.

Correlation graph for Sales price with respect to Genres are as shown below:

To be precise we can show the correlation individually:

There is no stronger correlation between the sales price and daily sales with respect to Genres Fiction and Non Fiction But there is a negative correlation with respect to children genre (i.e) the sales price decreases with the increase in daily sales.

Furthermore,between the impact of sales price on daily sales varies with genres since the interaction among st sales price and genre has a reasonably high significant value when compared to the different genres.